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O ! Abstract 
(N 

■ The convergence properties of the Iterative water-filling (IWF) based algorithms (H], ID, ||3]) have 

been derived in the ideal situation where the transmitters in the network are able to obtain the exact 
' value of the interference plus noise (IPN) experienced at the corresponding receivers in each iteration of 

the algorithm. However, these algorithms are not robust because they diverge when there is time-varying 
estimation error of the IPN, a situation that arises in real communication system. In this correspondence, 
O ' we propose an algorithm that possesses convergence guarantees in the presence of various forms of such 

time-varying error Moreover, we also show by simulation that in scenarios where the interference is 
, strong, the conventional IWF diverges while our proposed algorithm still converges. 

o 

0^ 



I. Introduction 



, A. The IWF Algorithm 



The Iterative Water-Filling algorithm has been first proposed by Yu et al in m to solve the power 
allocation problem in DSL network, and it has since been applied to various areas in communications 
and signal processing to obtain solutions for network power allocation problems (see, e.g. ||3l, ||4], ||5], 
|[6ll and the references therein). 

We consider an application of the IWF algorithm to the resource allocation problem in wireless 
communication network, where there are N users and K subchannels; each user is a transmitter-receiver 
pair that tries to communicate with each other Define the sets M ={!,■■ ■ , N], and /C = {1, • • • , A'}; 
let {Si}i^j^ denote the set of users in the network; let pi{k) denote the amount of power Si transmits on 
channel /c; let Pi = ,pi(K)]T,p_j ^ [pi,-- - , pj_i, pj+i, • • • ,p]v]^ and p = [pj,--- ,P7v]^- 

The channel gain between the transmitter of Si to the receiver of Sj on channel k is denoted by |i/jj(A;)p. 
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The power of the environmental noise experienced at Si's receiver on channel k is denoted by ni{k). We 
assume that there is no interference cancelation performed at the receivers, and the interference caused by 
the other users is considered as noise. Then the signal to interference plus noise ratio (SINR) measured 
at the receiver of Si on channel k can be expressed as: SINRi{k) = n ■ ' H '^p {k) - 

Using Shannon's capacity, the maximum transmission rate achievable for Si can be expressed as: 
i?j(pj,p„j) = X]£ilos(l + SINRi{k)). We consider the following constraints for each user: [C-1)] 
each Si has limited power budget, i.e., < '}2ik=iPii^) ^Vi-, V i G M; [C-2)] we require < Pi{k) < 
Pmask{k),y A; € /C and i € J\f. As such, we use Vi to denote the set of feasible power allocations for 
Si- Vi = jpi : Ylk=iPii^) < Pi.O < Piik) < Pmaskik), V /c G /c|. 

Dynamic power allocation in this network can be formulated as a non-cooperative game where each 
user Si is interested in maximizing its own rate when deciding how to allocate its power across the 
spectrum, i.e.. Si wants to find p* € Vi such that p* € argmaxp^g-p^ i?i(pj,p_j). A Nash Equilibrium 
(NE) can be expressed as the set of power profiles {p,*}jGAr satisfying the set of conditions: p* € 
argmaxp^g-p^ Ri{pi,p*_-) V i € M. The IWF and its various extensions are essentially policies for the 
players to jointly reach a NE of this game in a distributed manner. 

In the IWF, the transmitters iteratively adjust their transmission power levels to maximize their own 
transmission rate. Specifically, in iteration t + 1, each user Si computes {Pi^^ i^)} f.^j(^ follows: 

Pt^^ik) = arg max Ri{pi, p^J 



where cjj is the dual variable associated with the total power constraint for user i, and it is also referred 
to as the "water level" in the traditional water-filling algorithm; \Hj,i{k)\'^ and nj(fc) are defined as 
\Hj^i{k)\'^ — ]^'''|fc|[2 and ni{k) = jjj-^-^^^, respectively; IPN^{k) is defined as the normalized total in- 
terference plus noise (IPN) for user Si on channel k at time t: IPN-{k) = ni{k) + J2j^i 
This quantity is measured at the receivers and fed back to their corresponding transmitters in each 
iteration t before pl'^^{k) is computed. Define *j(p„j) = [^j{p_i), ■ ■ ■ , $f^(p_j )]■>", and let $(p) = 
[$i(p_i),-- - , $Ar(p_Ar)]'''. The function $(.) is called the water-filling operator of the system, and 
the IWF algorithm can be written concisely: p*'^^ = *(p*). If the algorithm reaches a power profile p* 
such that p* = $(p*), we say the IWF converges. 

Sufficient conditions for convergence of the IWF algorithm and its various extensions have been widely 
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Studied, for example, in ||4l, ||6], iH. Essentially, if the interference received (generated) at the receiver 
(transmitter) of each user is weak enough compared with the desired signal, then the IWF converges. 
When these conditions are not met, it is possible that the IWF diverges ||8]. 

B. The Uncertainty of IPN and the Water-Filling Operator 

One of the key assumptions of the IWF based algorithms is that the receivers can always get the exact 
values of the IPN on each channel in each iteration of the algorithm, and fed back to the transmitters. 
This assumption is not valid in real communication systems because the power of the noise/interference 
experienced at the receivers needs to be estimated in each iteration, thus is subject to time-varying 
estimation errors ||9], lITOl . Therefore, in each iteration of the algorithm, we can only obtain a noisy 
version of the true solution of ([1}, referred to as the noisy water-filling solution, as: 

r — l'Pmask{k) ^ 

Pf\k)= d,-IPN,{k) =^KvU) (2) 



-t 



where IPN^{k) is the noisy (estimated) IPN for user 5^ on channel k. Note that the uncertainty of the 

r lPm.ask{k)_ 

IPN leads to the inaccuracy of the dual variable, as now it should satisfy z2k=i ^« ~ ^P^iv^) — Pi' 
and ai ^ di in general. 

There is little work in the literature that addresses the impact of such time-varying uncertainty of the 
IPN on the performance of the IWF algorithm. In ||3l, ||4l, a "relaxed" version of IWF (R-IWF) was 
proposed to heuristically deal with inaccurate IPN levels. In each iteration, the transmission power levels 
are computed as p*+^ = (1 — A)p* + A$(p*), where the A G (0, 1] is a free parameter. Although it has 
been shown in lH that this algorithm converges under similar conditions as the IWF in situations without 
IPN uncertainty, the effect of this algorithm in the presence of IPN inaccuracy is not clear, and as we 
will see later in the simulation section, the performance of R-IWF depends strongly on the choice of 
A. In im . a robust version of IWF is proposed to deal with errors related to changes in the number of 
users and their mobility. The algorithm guarantees an acceptable level of performance under worst case 
conditions (i.e., the maximum possible error of the IPN). This algorithm trades performance in favor 
of robustness, thus the equilibrium solution obtained is generally less efficient than that of the original 
IWF. In our work, we are concerned with reaching the equilibrium solution of the original IWF in the 
presence of IPN uncertainty. In |[T2l . the authors provide a probabilistically robust IWF to deal with the 
quantization errors of the IPN at the receiver of each user. In this algorithm, users allocate their powers 
to maximize their total rate for a large fraction of the error realization. However, a specific distribution of 
the error process is assumed in the derivation of the algorithm, and such statistical information is usually 
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not available in practice (as suggested in section V of ITT]). A recent work llT3l proposes algorithms 
for system with finite-state Markov channel in interference network. The channel itself is modeled as 
time-varying in this work, and the objective is to track the time-varying equilibria. In the present paper, 
uncertainty of the channel is due to imperfect receiver estimation of the value of IPN as opposed to 
changes in the state of the channel. 

In this coixespondence, we propose an extension of the IWF algorithm that is robust in the presence 
of time-varying IPN uncertainties. Specifically, we model the uncertainty regarding to the IPN as time- 
varying added noises, and show that the proposed algorithm converges with probability 1 under some 
conditions on the channel gains and the noise process. We verify the above claim by simulation, and 
demonstrate the advantage of the proposed algorithm with respect to the original IWF and the R-IWF. Ad- 
ditionally, we show by simulation that in some strong interference channels where the conventional IWF 
algorithm diverges, our proposed algorithm still converges. This last result indicates that the convergence 
condition of our algorithm may be further relaxed. 

This correspondence is organized as follows. In section JI] we introduce the proposed algorithm and 
provide convergence analysis. In section JII] we demonstrate the performance of the proposed algorithm 
and compare the results with conventional IWF. This correspondence concludes in section JV] 

II. Proposed Algorithm and Convergence Results 
In the proposed algorithm, in each iteration t, all the users compute their power allocations as follows: 

1) Obtain {IPN ^{k)}^^!^, and calculate the noisy water-filling solution $j(pl^). 

2) Calculate the power output according to the following policy: 

^ I *.(P*-.) for t = 

[ (l-at)p* + at*i(p*_,) fort>l 

where the elements of $j(p^,-) are defined in Q. The sequence {at : < at < l}^o satisfies the 
following (define = I): 

T T 

lim > at = oo, lim > a? < cx). (4) 

i=0 t=0 

Note that from the last inequality in Q, we have limf_j.oo a* = 0. The update procedure ^ is essentially 
Mann's iterations (see fT4l for its properties), which is designed for situations where conventional iterative 
methods for finding the fixed point of a self-mapping (say Picard's method) fail. If we choose at = -j^, 
then the update policy in (|3]l can be rewritten as: pj^^ = J2t=o Clearly pj'^^ is an average 

of the history of Si's water-filling solution, hence the name of Average Iterative Water-Filling (A-IWF) 
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for the proposed algorithm. This algorithm maintains the distributed nature of the original IWF, because 
in each iteration t + 1, Si only needs to know the set of IPN {IPN^{k)}k£ic as well as its own power 
allocation {pl{k)}k(zK in iteration t (both of which can be obtained locally by Si), but does not need to 
know the transmission power profiles of other users. 

We see that the main difference between the proposed algorithm and the previously mentioned R-IWF 
is that we use a set of diminishing and iteration dependent stepsize {at}^o ^^^^ satisfies @, instead of 
the fixed stepsize A. We will see later that it is exactly these properties of the {a^j^Q that guarantee the 
convergence of A-IWF under IPN uncertainty. 

We model the noisy IPN for user Si on channel k as: IPN^{k) = IPNf{k) + el{k), where e*(A;) repre- 
sents the estimation error of the true value IPN^{k). Let ej = [et(l), • • • , ei{K)]'^ , and e = [ei, • • • , e^Y. 
Let 7"?^ be defined as the filtration generated by pf"^^ UlPi' ^iiP-i)}f=o- assume the error process 
to be zero mean, i.e., £'[e-(fc)|J^*~^]=0. This assumption is reasonable because conditioning on the 
knowledge of the desired signal (p* in our case), the estimation error of IPN^{k), (\{k) can indeed 
be viewed as a zero mean random variable using most conventional estimators (see Section V of 111 for 
detailed comparison of estimation biases for different algorithms). The above model is very general in 
the sense that we do not assume the explicit forms of the algorithms that perform the estimation, nor do 
we require that the error process {e\{k)\^^^ be independent with the history of IPN up to time T, i.e., 
our model allows IPN^{k) to be calculated based on the previous or the current observations made by 
the receiver of Si. 

In the following, we use "w. p. 1" to abbreviate "with probability 1". We need the following definition 
before introducing Lemma [T] which characterizes the noisy version of the water-filling operator *(p). 
For any positive x 1 vector w = [wi, ■ ■ ■ ,wn]'^ and the operator *(p), the (vector) block-maximum 



Let p(T) be the spectral radius of the matrix T. Then if p(T) < 1, there must exist a positive vector 
w, and a constant (3 that satisfies < /? < 1, such that for any feasible p^, p^ € V, 



norm IMI^fe^ocfc is defined as IBl: | |*(p)| l^^ocfc - maxigAr 




Lemma 1: Define a N x N matrix T related to the channel gains as: 




(5) 



$(pl) - ^{p')\\luock < /5||P' - p'Wtuock + Mlblock- 



(6) 



Proof: The Proof is similar to Proposition 2 of Hj. Please see Appendix lAl for detail. 
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We note here that it has been proven by H, that when p(T) < 1 is true, the original water-filling 
operator $(p) is a contraction with coefficient /3 < 1, and hence has a unique fixed point, i.e., there 
exists a unique p* G P such that p* = *(p*). 

We then characterize the convergence property of the A-IWF algorithm under two different assumptions 
of the noise process {e*}^o- simplicity of notation, in the following, we use ||.|| to denote the norm 
IMl2'feZocA,' where w is the positive vector obtained from the proof of Lemma [T] 




Theorem 1: Assume p(T) < 1, {af}^Q satisfies (01), and {e*}^Q satisfies Xlt^i o;tlk*ll < oo, w. p. 1. 
Then the sequence of power profiles {p*}^| generated by the A-IWF algorithm converges to the unique 
fixed point of the original mapping ^ ( . ), denoted by p*. More precisely, we have: 1 1 p* — p* 1 1 — )■ w. p. i . 



Theorem 2: Assume p{T) < 1, and {a^ satisfies Q, and the error process satisfies limj_^oo 1 1^* 
0, w. p. 1. Then we have: ||p* — p*|| — > w. i. 



At this point, we would like to give some remarks regarding to the above convergence results. 

Remark 1: The condition /3(T) < 1, which is a restriction on the channel gains, coincides with the 
condition that ensures the convergence of IWF without the IPN uncertainties in Theorem 1 of [I?!. We 
refer the readers to 0] for physical interpretation as well as the comparison of this condition with other 
similar conditions derived in the literature, e.g., those in 161 and Q. 

Remark 2: We will show in section IIII-BI that in many cases when p(T) < 1 is not satisfied, our 
algorithm still converges. This suggests that the A-IWF algorithm may need more relaxed convergence 
conditions than the one stated in this correspondence. We will leave this task as a future research topic. 

Theorem [T] and Theorem |2] differ in their respective restrictions on the error process {e*}^Q, as 
technically the conditions Xlt^o 1^*1 1 ^ °° ^'^^ limt-i-oo = do not imply each other. Although 
these conditions require that the error process be diminishing, we do observe in our simulations (to be 
shown in Section Hill) that the A-IWF converges in the presence of more general forms of noises, for 
example noises with zero mean and bounded second moment. This observation leads us to believe that 
the above conditions on the error process are overly restrictive. Such belief is partially justified as follows. 

Assume E[ej\T^~^]=0, and e* has bounded second moment for all i. Further assume $(p*) can be 
approximated as: $(p*) = *(p*) + where the elements of the bias vector ^* satisfies: 




Proof: Please see Appendix |B] for proof. 



Proof: Please see Appendix O for proof. 



E[m\^t']=0 and Emik)f\Tt']<oo. 



(7) 



Then we have the following convergence result. See Appendix |D] for the proof. 
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Theorem 3: Suppose $(p*) is approximated as $(p*) = $(p*)+^* with the elements of^ satisfies ([7]). 
If ^(.) is a contraction with constant f3, and if{at}t^i satisfy (01), then we have: ||p* — p*|| — ?> w. 1. 

Theorem|3]essentially says that if the above approximation of the noisy water-fiUing solution is accurate, 
then we only require the error process {e*}^Q to have mean zero and bounded second moments to ensure 
the convergence of the algorithm. Note that in this case the bias vector ^* summarizes the uncertainties 
regarding both the IPNs and the dual variables. The key assumption here is that E[(,'[{k) \J-^f~^] = V i, A;, 
i.e., based on all the knowledge it has for the evolution of the algorithm until time T — 1, a particular 
user Si predicts that the biases {£,f{k)}k are zero mean. The following empirical experiments show that 
such assumption is approximately true. 

Consider a network with 10 users and 32 channels. Let pi = 10, Pmask{k) = 3, V A; G /C. We define 
the bias of the noisy water-filling solution as: 

We simplify the analysis a bit by assuming the bias process to be Markovian, i.e., ElS^f {k)\Tf~^] = 
^[Cf {f^)\pj]- We investigate the distribution of {£'[Cj(^)jpj]}i,fc- Define the variance of noise ei{k) as 
vari{k); introduce a term called Interference Error Ratio (lER) to quantify the strength of the IPN error e: 
IERi{k) = lOlogio We fix lER = lOdB during the experiment. As E[S,i{k)\pi] is a function 

of Pi, we fix {pi G Vi}i<^j\f, and obtain an estimate of {E[S^i{k)\pr^}i^k, denoted by {Mi{k)}i^k, by doing 
the follows: 1) generate the channel gains {\Hi^j{k)\'^)} randomly; 2) generate L samples of IPN noise 
vectors {e'l^^ ^ ^(Oi vari{k)), V i, k, y; 3) obtain the bias {^'j^^ according to its definition 

above; 4) calculate Mi{k) = ^ ^ k. We repeat the above procedure for 1,000 times with 

randomly generated sets of {pi G Vi^^zj^f, and plot the empirical distribution of {E[^i{k)\pi\]i k in Fig. 
[U (different graphs in Fig. [T]represent the results obtained by experiments using different L). We see that 
when the estimates {Mj(A;)}j ^ are getting more accurate with larger number of samples (larger L), the 
empirical distribution of {E[^i{k)\pi\]i^k is getting more concentrated at zero. Thus we conjecture that 
asymptotically with L — > oo, E[^i{k)\pi] can be approximated as zero for all i and k. 

Remark 3: We give some remarks comparing the convergence conditions of conventional IWF and A- 
IWF under uncertainty. From |[T6l (Chapter 12, Th. 12.2.1-12.2.5) we see that the condition Iimt_j.oo = 
in Theorem |2] is sufficient and necessary for the conventional IWF to converge to the fixed point without 
performing averaging. However, the conventional IWF diverges under condition Xlt^i "^tll^*!! < oo in 
Theorem [U because this condition is not equivalent to lim^^j^oo =0. Moreover, from Th. 12.2.5 in 
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Fig. 1. Empirical distribution of {E[^i{k)\p]},^k- Left: L = 1, 000; Middle: L = 10, 000; Right: L = 100, 000. 

|[T6l . under the assumption in Theorem |3] the conventional IWF produces a sequence that finally stays 
in a ball around the fixed point. However, the radius of such ball is increasing with '""^'Jf II . Notice 
that in this case |[^*|| needs not to be decreasing, thus the maximum possible error of the conventional 
IWF may be large (consider when /3 is close to 1). 

III. Simulation Results 
In this section we conduct three experiments to demonstrate the properties of the A-IWF algorithm. 

A. Performance with Estimation Error 

We simulate a network with 10 randomly located users, and 64 channels. We choose the noise to be a 
zero mean Gaussian random variable as e\{k) N{Q,varl{k)); we choose the lER for all the users on 
all the channels to be IER\{k) = 20dB, 15dB; we choose the channel gains {\Hij{k)\'^} randomly and 
appropriately such that p(T) < 1 is satisfied; we choose at = For ease of demonstration, different 
algorithms are examined with the same starting points. 



+ Relaxed IWF 




Fig. 2. Comparison of the output for different algorithms, IER=20dB. 

In Fig. Ill we show the power output produced by various algorithms of a particular user on a particular 
channel, with lER = 20dB. It is clear that in the presence of estimation error, the IWF algorithm 
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produces a sequence of noisy power profiles wbiicii exliibits no sign of convergence. We also show the 
performance of IWF algorithm without estimation error, for the purpose of comparison. It is seen that the 
A-IWF algorithm converges to the unique NE predicted by the IWF (without estimation error) quickly. 
In Fig. |2j we also show the output of the R-IWF algorithm with various values of A. We observe that 
when A is large, the output is still noisy, while when A is small, the convergence is slow. The point is 
that the choice of A is important for the performance of R-IWF, but it is difficult to correctly choose A 
to guarantee both robustness and fast convergence. In Fig. |3] we compare the selected power profiles of 
R-IWF and A-IWF when lER = 15dB. 

2| 1 1 1 1 1 1 

8| 1 1 1 1 1 1 1 1 1 1 




Fig. 3. Comparison of different algoritlims, IER=15dB. Fig. 4. Comparison of convergence speed of IWF and A-IWF. 

B. Performance with Strong Interference 

As stated above, the convergence of the IWF in ideal situations usually dependes on the weak 
interference condition. It is observed that in the system with strong interference, IWF algorithm diverges 
im. In the following simulation, we demonstrate several scenarios in which the IWF diverges, but the 
A-IWF algorithm converges. The purpose of these simulations is to argue that the A-IWF may need 
weaker conditions for convergence. 

Consider the following scenario of strong interference (example 5 in HI). Suppose there are 3 users 
and 2 channels in the system, with channel matrices H(/c) expressed as follows: 



H(l) = H(2) = 




where each element of the matrix H(A;) is defined as [H(A;)]j j = |//jj(A;)p. Set the noise power on 
channel 1 to be a^, the noise on channel 2 set to be + pi, with pi = 10, for all i € N. There is a 
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unique NE of this game, in which each user allocates two-thirds of its power to channel 1 and the rest 
to channel 2. The left hand side part of Fig. |5] shows the power profiles of user 1 on channel 1 that 
are produced by different algorithms (with the same starting point). It is seen that the IWF algorithm 
oscillates, while the A-IWF algorithm converges quickly. Similar results are obtained in the right hand 
side part of Fig. |5]with the following settings: 



H(l) 




,H(2) 




(8) 



and the noise power on both channels set to be the same. We observe again that the performance of 
R-IWF algorithm is very sensitive to the choice of A: when 0.6 < A < 1, the output oscillates; when 
< A < 0.5, the output converges, with larger A for faster convergence. However, it is not clear what 
rules one should follow in general to select such critical parameter. 



If t t , t t 



t t 



—A-IWF 
"-R-IWF >.=0.8 
B R-IWF X=0.2 
A-IWF 





-IWF ?L^0.5 










R-IWF A=0.1 










.. R-IWF^^O.B 




. ■ .,e . . . 


























— A-IWF 










R-IWF 3i=0.6 










• R-IWF 3i=0.5 


1 






---R-IWF ?.=0.1 











Fig. 5. Convergence properties of difference algoritlims in strong interference channels. 



C. Convergence In Ideal Cases 

Questions may arise as to how does the A-IWF perform in situations when the water-filling solution 
in ([T]) can be carried out accurately. As shown in the IWF algorithm converges linearly in this 
ideal scenario. Theoretically, we can only show that A-IWF converges sublinearly in ideal scenario, i.e., 
lim^^^oo ^^[[pt-p^ii^^ = 1- However, we observe in various randomly generated channel gains and random 
starting points of the algorithms, that the A-IWF algorithm seems to always converge as fast as the IWF 
algorithm. Fig. |4] shows such an instance of this experiment. In this figure, we compare the power output 
of selected users on selected channels (in a network with 10 users and 64 channels) generated by the 
IWF and the A-IWF. It takes less than 10 iterations before two algorithms agree with each other. Note 
that the dotted lines represent the output of the IWF algorithms and the solid lines represent the output 
of the A-IWF algorithm. 
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IV. Conclusion 

In this correspondence, we proposed an extension to the IWF algorithm which is more robust and has 
better convergence properties. We proved that the proposed algorithm converges w. p. 1 under suitable 
assumptions. We argue that this algorithm is indeed robust against time-varying estimation eixor of the 
power of interference plus noise that is needed for the computation of the IWF computation. We also 
show by simulation that the proposed algorithm converges when strong interferences are present in the 
communication channel, a scenario in which the IWF algorithm diverges. An interesting future research 
topic is to develop a possibly more general condition for the convergence of the proposed algorithm. 

Appendix A 
Proof of Lemma [T] 

Proof: Define H^-j = diag{\Hj^i{l)\^ , ■■■ , \Hj^i{K)\'^); define IPN^ = [/PiVi(l), • • • ,IPN^{K)Y, 
and define IPNj similarly. From Corollary 3 of |0, we have that the water-filling operator $i(p_j) 
can be expressed as the projection of — IPNj onto the space Vi, i.e., $j(p_j) = [— IPNjJp . Similarly, 
we have that $j(p_j) = — IPNj . Consequently, we have: 

WMpU) - ^^iP-i)\\2 < II -YI^jM - ^^ + J]H,-,p2||2 



^Zll|Hj,i||2||p]-Pj||2 + ||e 

(6) 



j 2 



J]Lax|^,-,(A;)|2') ||p]-p2||2 + l|e.||2, V i G AT (9) 

where (a) is because of the non-expansiveness of the projection operator under Euclidean norm; (6) is 
due to the fact that the 2-norm of a diagonal matrix equals to the maximum absolute value of its diagonal 
elements. Define e^^ = ||*i(pLj) - *i(p^j)||2, = ||p| - pf||2, and let 
e = [ei, • • • ,eNy, and ee = [||ei||2, • • • , | |eAr| 12]"^. 

In order to proceed, we define the vector weighted maximum norm |[T5l as: 

l|x||^,„ec - max^, w>0,x€M^ (10) 

I Wi 

and the matrix weighted maximum norm as: 

1 ^ 

l|A||^,„„t ^max — J]|[A],,,>j, w>0,AeM^x^. (11) 
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Notice, that from the definition of norm ||.||^ ,;ec' ll-ll^,mat the block-maximum norm, we have the 
following equivalence: 

II 1 1 1 2 1 1 

llpliw —^„„ \\Pi~Pi\\ — ||r>l r.^ll'^ 

\\^\\oo,vec — — max — IIP P \\2Mock 

t Wi I Wi 

I I ^ 

l|e$||^,„ec = ™ax— ^ = ||*(p^) - ^{v^)\\2,block 

II MW ll^^lls I I I |W /ION 

\oo,vec — — \ \^\ \2Mock- 

t Wi 

The set of N inequalities in ^ can be concisely written in vector form as (T is defined in Q): 
< Te + ee- Applying vector weighted maximum norm to this inequality results in: 

IIpj^II^ < IITII'*^ IIpII'*^ -I-IIp^II'^ 

ll^lloo,?;ec — II I loo, mat II lloOjtiec ' II fclloo,«;ec 

||"ir||W ||_.l T^2||W , ll^MW /'1'2\ 

— II lloo.maillP ~P 1 12, Mocfc 1 1^1 l2,Wocfc- \'^^) 

Arguing similarly as the derivation of the Proposition 2 of H by using (fT2)) and (fT3l) we have: 

mp') - Mp')\\luock 

— Ho II'"' <r IITII'"' IIt^I _ T-.2||w _|_ ||,||w nA\ 

— ll*=$lloo,Dec — II -•- lloo.matllP P 1 1 2, Wocfc 1 1*^1 1 2, feiocfc- v^^^ 

Since T is a non-negative matrix, from lITSl Corollary 6.1, we have that there exists a w such that 
p{T) < 1 < 1. Consequently, we conclude that if p(T) < 1, then there must exists a 

/3 G (0, 1) and a positive vector w that satisfy Q. ■ 

Appendix B 
Proof of Theorem [T] 

Proof: Starting from an arbitrary initial point p*^ € V, the magnitude of the difference between 
and the fixed point p* can be expressed as: 

||pi-p*|| = ||(l-«o)p° + ao*(p°)-p1l 

< 11(1 - ao)(p° - P*)|| + ao (*(p°) - *(p*) 



< (1 - ao)||p° - P*|| + ao/3||p° - P*|| + ||aoe°|| 

l_ao(l-/3)+ ||p°-p*|| (15) 

\\p — p WJ 
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where (i) is from Lemma [T] Let us denote /it = (1 — at(l — /?)). From (|3]l and dD, clearly we have 
ao = 1 and at < 1, which implies ij-q < iJ-t ^ t. By induction, we show that in general: 

-Pi < ( n Z^-' + Vo ill ) P - P ■ (16) 

Clearly from ([TST i at time T = 1, (fT6l ) is true. Suppose at time T, (fT6l) is true. At time T + 1, we have: 

IIp^+i < - aT)\\p'^ - p*\\ + aTl3\\p^ - p*\\ + Ware^W 



< 



< 



+ Vl) 711 + TTl) 7u)\\P -P 

t=0 /^0||P"-P|| ||pU-p*M/ 



11/^^+ no .11 )\\P'-P*\\- (17) 

f^-^ /fiollp" -P*|| ^ 

Note that in the last inequality, we have used the fact that > /io» and ||^o^^i|| < [ | ■ From 

the assumption J2t^i oaW^^W < oo, w. p. 1, there must exist some constant < 6 < oo such that: 

T 

lim ||ate*|| < 6 < oo w. p. 1. (18) 

t=i 

In the following, we show limr^oo Zlt^ (nj=t ) llo^t^ II = w. p. 1. 
First note that we have limr^oo H^o — 0' because: 



T T 
lim log(TT/it) = lim V log (l + -/?))) » 



t=0 t=0 

T 



< lim (1 - /3) y -at =^ -oo (19) 



t=0 



where (i) is because — 1 < — at(l — /9) and the fact log(l + x) < x,V x > — 1, (ii) is because dUl 
and /3 < 1. Clearly ([T9] l implies lim^-^^oo Y[t=o f^t = 0. Thus for any 6 > 0, and a fixed T there exists 
f (T, (5) > T such that: 



N-l ^ 




llf^t<^,yN>f{T,6). (20) 



t=T 



From ([T8] ) we have that for any 6 > 0, there exists T{S) such that: 



oo r 

J]||ate*|| < -, Vr>r(5), w. p. 1. (21) 
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Then we have that for all N > max ^T{5),f{T{5), 5)} = f{T{5),d): 



N N-1 

t 



t=o j=t 

T(<5) 7V-1 AT AT-l 

=E(n/^i)ii"*^*ii+ E (n^^-)ii«t^*ii 

i=0 j=t t=T{5)+l j=t 

<E( n /^.)ii«*^*ii+2 

f=0 i=T(<5) 

< n HT.\\''i^\\+2 

j=T{5) t=0 

<4^ + ^ = 5w.p.l (22) 

where (i) is because (|2T] ) and the fact that IljL^^ < 1 for all f < A'' — 1; is because njliT(5) /^j 
is independent of t; {in) is because of (fTSl) and (l20l) . Consequently, we have that: 

T-l T-1 

lim V ( TT /Ui)||ate*|| = w. p. 1. (23) 

i=0 i=i 

From ([T6l ). (l23T l. and lim^^oo H^o ~ conclude : lim(_^oo I |p* — P* 1 1 = w. p. 1 . ■ 

Appendix C 
Proof of Theorem [2] 

Proof: Due to space limit, we only show the proof for the case that at = The proof for general 
{at} can be obtained similarly. When taking at = -j^, the A-IWF algorithm can be written compactly 

T+1 



as: p^+^ = Ylt=o *(P*)- can write: 



1 ^ ^ 

iP^+^-p*ii<^j;ii*(p*)-*(p*)ii 

< ^ 



t=0 i=0 

where (i) is from Lemma[T] Suppose the sequence {p*} does not converge to p*, i.e., lim7-_^oo sup | |p^ — 

p*|| = 5 > 0. Using the Stolz-Cesaro Theorem IITtI . we have that: 

lit * II 

i=0 P~P ^T llT *|| sr 

mil sup < lim sup p — p = o; 

lim ^*=°" " = lim ||e^|| = 0, w. p. 1. (25) 
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Taking limsup on both sides of (I24b . we have: 

lim sup IIp"^"*""^ — p*|| 



T-5.00 

T 1 ^ 
< lim sup > Hp*— p*||+ lim sup > 



|e*|| (26) 

which can be reduced to: 6 < P6 hy applying ( |25] ). This is a contradiction to the fact that /3 < 1. Then 
we conclude that limT^oo sup ||p^ — P*[| =0 which in turn implies lim^-i-oo ||P"^ ~ P*ll = 0. ■ 

Appendix D 
Proof of Theorem [3] 

Due to space limit, we only show the proof for the case that at = -j^. The proof for general {at} 
can be obtained similarly. We first state a lemma, the proof of which can be found in Appendix 10 

Lemma 2: If w^+^ = (1 - at)w* + atC*+\ and E[S,^\J^~^] = 0, and E[{£,*f\J^-^] = b is uniformly 
bounded, {at} satisfies then we must have Wmt^ooW^ = 0, w. p. 1. 

We are now ready to prove Theorem |3] The A-IWF algorithm can be compactly written as: p^+^ 
TTTEr=o*(P*) = TTTi:L*(P*)+w^, where A _1_ ^ ^ (1 _ + _l_^r_ ^^^^^ 

that by applying the results of Lemma |2j we have lim5"_>.oo w-^ = 0. Then the magnitude of difference 
between p^"*"^ and the unique fixed point of the mapping $(.) can be expressed as: 

T 



ip^+^ - pII < ^ ii*(p*) - *(p*)ii + 11^ 

t=0 



Suppose the sequence {p*} does not converge to p*, then there must exist a (5 > such that limr-^oo sup | |p* 
p*|| = 5. Using again the Stolz-Cesaro Theorem as in (|25T l. and taking limsup on both sides of 
(I27] ). we have: limr-^oo sup llp"^"^"*^ — P*ll < limr^oo sup Yld=Q IIP* ~ P*ll + li™T^oo Hw-^"!]. This 
inequality can be reduced to: 5 < (36, which contradicts to the fact that /? < 1. Thus we conclude that 
lim^-s-oo Hp* — P* 1 1 =0, and that limT_^oo P* = P*- 

Appendix E 
Proof of Lemma [2] 

Proof: We have w^~^^ = + at{C*~^^ — w^). Consider the following iteration: 

{w'-^'f = {w' + at{e'+'-w'))^ 

= (w'f + 2at(e*+^ - w')w' + - w'f. (28) 
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Then E[{w'^^^)'^\J^'] can be expressed as: 

= {w'f - 2at{w'f + al {E[{e+^f\F'] + {w'f - 2w' E[e+^\F']) 
<{w'f-2at{l-'^){w'f + alh. (29) 

Notice that the tenn 20^(1 — > because < < 1. We see that limr^oo Z]?lo '^i ^ < °° 

cause Y^^i Oil < oo. In order to proceed, we define the notion of a non-negative almost-supermartingale 
|[T8l . Let zt, f3t, £,t and be non-negative measurable random variables. The sequence {zt} is called 
non-negative almost-supermartingale if E[zt^i\J^] < (1 + /3j)zt + — Ct- From Theorem 1 of fTS], we 
have limt_>oo ^^t exists and is finite and J^u^i Ct < w. p. 1 if {^2^=1 f^t < oo, J2tZi < °°}- 

Now it is clear that the sequence {(w*)^}^o ^ non-negative almost-supermartingale, and according 
to the above mentioned theorem we have the following results: 1) {{w^)'^}^q converges; 2) YlJ^i — 
< oo w. p. 1. The second result implies that lim^-^^oo X]t=i ottiw^)"^ < oo. Combined with the 
fact that Ylt^o'^t = oo and limt-s-ooOt = 0, we have that liminff_j.oo(^^*)^ = 0. Moreover, we know 
from the first result that the sequence {(w*)^}^q converges, then it must converge to 0. ■ 
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